Global Induction of Decision Trees
نویسنده
چکیده
Decision trees are, besides decision rules, one of the most popular forms of knowledge representation in Knowledge Discovery in Databases process (Fayyad, Piatetsky-Shapiro, Smyth & Uthurusamy, 1996) and implementations of the classical decision tree induction algorithms are included in the majority of data mining systems. A hierarchical structure of a tree-based classifier, where appropriate tests from consecutive nodes are subsequently applied, closely resembles a human way of decision making. This makes decision trees natural and easy to understand even for an inexperienced analyst. The popularity of the decision tree approach can also be explained by their ease of application, fast classification and what may be the most important, their effectiveness. Two main types of decision trees can be distinguished by the type of tests in non-terminal nodes: univariate and multivariate decision trees. In the first group, a single attribute is used in each test. For a continuousvalued feature usually an inequality test with binary outcomes is applied and for a nominal attribute mutually exclusive groups of attribute values are associated with outcomes. As a good representative of univariate inducers, the well-known C4.5 system developed by Quinlan (1993) should be mentioned. In univariate trees a split is equivalent to partitioning the feature space with an axis-parallel hyper-plane. If decision boundaries of a particular dataset are not axis-parallel, using such tests may lead to an overcomplicated classifier. This situation is known as the “staircase effect”. The problem can be mitigated by applying more sophisticated multivariate tests, where more than one feature can be taken into account. The most common form of such tests is an oblique split, which is based on a linear combination of features (hyper-plane). The decision tree which applies only oblique tests is often called oblique or linear, whereas heterogeneous trees with univariate, linear and other multivariate (e.g., instance-based) tests can be called mixed decision trees (Llora & Wilson, 2004). It should be emphasized that computational complexity of the multivariate induction is generally significantly higher than the univariate induction. CART (Breiman, Friedman, Olshen & Stone, 1984) and OC1 (Murthy, Kasif & Salzberg, 1994) are well known examples of multivariate systems.
منابع مشابه
Evolutionary Induction of Cost-Sensitive Decision Trees
In the paper, a new method for cost-sensitive learning of decision trees is proposed. Our approach consists in extending the existing evolutionary algorithm (EA) for global induction of decision trees. In contrast to the classical top-down methods, our system searches for the whole tree at the moment. We propose a new fitness function which allows the algorithm to minimize expected cost of clas...
متن کاملGlobal Induction of Decision Trees: From Parallel Implementation to Distributed Evolution
In most of data mining systems decision trees are induced in a top-down manner. This greedy method is fast but can fail for certain classification problems. As an alternative a global approach based on evolutionary algorithms (EAs) can be applied. We developed Global Decision Tree (GDT) system, which learns a tree structure and tests in one run of the EA. Specialized genetic operators are used,...
متن کاملEvolutionary induction of global model trees with specialized operators and memetic extensions
Metaheuristics, such as evolutionary algorithms (EAs), have been successfully applied to the problem of decision tree induction. Recently, an EA was proposed to evolve model trees, which are a particular type of decision tree that is employed to solve regression problems. However, there is a need to specialize the EAs in order to exploit the full potential of evolutionary induction. The main co...
متن کاملA Memetic Algorithm for Global Induction of Decision Trees
In the paper, a new memetic algorithm for decision tree learning is presented. The proposed approach consists in extending an existing evolutionary approach for global induction of classification trees. In contrast to the standard top-down methods, it searches for the optimal univariate tree by evolving a population of trees. Specialized genetic operators are selectively applied to modify both ...
متن کاملSimilarity Constraints in Beam-search Building of Predictive Clustering Trees
We investigate how inductive databases (IDBs) can support global models, such as decision trees. We focus on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs top-down, using a greedy algorithm, similar to that of C4.5. We propose a new induction algorithm ...
متن کاملKnowledge Extraction from Metacognitive Reading Strategies Data Using Induction Trees
The assessment of students’ metacognitive knowledge and skills about reading is critical in determining their ability to read academic texts and do so with comprehension. In this paper, we used induction trees to extract metacognitive knowledge about reading from a reading strategies dataset obtained from a group of 1636 undergraduate college students. Using a C4.5 algorithm, we constructed dec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009